TREES
Photo by Kate Sade on Unsplash
…in our increasingly global economy, highly skilled foreign workers are certain to be in a position to make unique contributions to the U.S. economy…
— Senate Judiciary Committee 2000
The H-1B is a visa in the United States under the Immigration and Nationality Act, section 101(a)(15)(H) that allows U.S. employers to temporarily employ foreign workers in specialty occupations. A specialty occupation requires the application of specialized knowledge and a bachelor’s degree or the equivalent of work experience. The duration of stay is three years, extendable to six years; after which the visa holder may need to reapply. Laws limit the number of H-1B visas that are issued each year: 188,100 new and initial H-1B visas were issued in 2019.
To illustrate the circle packing layout and custom configurations to manage the large number of data points, we’ll demonstrate these key features:
df_file_path <- "archetypes/h-1b-employers/2019-H-1B-Employers.csv"
df = read.csv(df_file_path, header = TRUE, stringsAsFactors = FALSE)
df
# Complete cases
df_wrangle <- df %>% mutate(ID = row_number(), Approvals = as.integer(Initial.Approvals)+as.integer(Continuing.Approvals) )
df_wrangle <- filter(df_wrangle, nchar(State) > 0 )
df_wrangle <- filter(df_wrangle, nchar(Employer) > 0 )
df_wrangle <- filter(df_wrangle, nchar(ZIP) > 0 )
# limited for performance
df_wrangle <- filter(df_wrangle, Approvals > 2 )
# create a unique node id to collapse three level hierarchy
df_wrangle <- df_wrangle %>% mutate(EMP_UNIQ = paste0(Employer, "_", State, "_", ZIP))
# unique edges
df_edges <- aggregate(x = df_wrangle$Approvals,
by = list(df_wrangle$State, df_wrangle$EMP_UNIQ),
FUN = sum)
# standard edge table structure
colnames(df_edges) <- c("FROM","TO", "SIZE")
# df_edges
# root nodes
df_nodes_1 <- aggregate(x = df_wrangle$Approvals,
by = list(df_wrangle$State),
FUN = sum)
colnames(df_nodes_1) <- c("NODE","SIZE")
# df_nodes_1
# leaf nodes
# df_nodes_2 <- df_wrangle %>% select(EMP_UNIQ, Approvals)
df_nodes_2 <- aggregate(x = df_wrangle$Approvals,
by = list(df_wrangle$EMP_UNIQ),
FUN = sum)
colnames(df_nodes_2) <- c("NODE", "SIZE")
# df_nodes_2
# Combine
df_nodes <- rbind(df_nodes_1, df_nodes_2)
# Transform to graph data structure
df_graph <- graph_from_data_frame( df_edges, vertices = df_nodes )
df_edges
df_nodes
theme_opts <- theme(
text = element_text(family = "inconsolata"),
legend.position='none'
)
n <- 15
top_x <- head(arrange(df_nodes_2, desc(SIZE)), n = n)
top <- top_x$SIZE[[1]]
bottom <- top_x$SIZE[[n]]
v1 <- ggraph(df_graph, layout = 'circlepack', weight = SIZE) +
geom_node_circle(fill="#F0F0F0") +
geom_node_label( aes(label=name, filter=depth==0), size = 6, family = "inconsolata") +
geom_node_text( aes(label=gsub("_", "\n", name), filter=SIZE >= bottom & SIZE <= top), size = 3, color = "#333333", family = "inconsolata", face="bold") +
coord_fixed() +
theme_void() +
theme_opts
girafe(ggobj = v1, width_svg = 1280/72, height_svg = 1280/72,
options = list(opts_sizing(rescale = TRUE, width = 1.0))
)